A Modern Approach to Searching the World Wide Web: Ranking Pages by Inference over Content
نویسندگان
چکیده
The Hypertext-based Webs such as Intranets contain a vast amount of information pertaining to an enormous number of subjects. It is, however, an organically grown and thus essentially structureless environment that is in a constant state of flux. Therefore, finding useful information pertaining to a particular topic is oftentimes a difficult task. Search engines were designed with the intent of easing the burden on the individuals perusing the Web for specific topics. Traditionally, Web search engines have used straightforward–and relatively naïve–approaches towards indexing and ranking pages pertaining to a particular subject. As our understanding of hyperlinked environments has improved, algorithmic tools have been developed that more effectively distill the plethora of information that exists within this environment. We will briefly discuss the history of the World Wide Web, the approaches employed by “traditional” search engines, and how alternative techniques can improve upon older approaches. We find that new techniques build upon, rather than replace, previous approaches, and that the problem of searching the Web is one that evolves as our understanding of the Web’s structure improves.
منابع مشابه
An Effective Method for Ranking of Changed Web Pages in Incremental Crawler
The World Wide Web is a global, large repository of text documents, images, multimedia and much other information, referred to as information resources. A large amount of new information is posted on the Web every day. Web Crawler is a program, which fetches information from the World Wide Web in an automated manner. The crawler keeps visiting pages after the collection reaches its target size,...
متن کاملA Probabilistic Model for Optimal Searching of the Deep Web
Since the advent of the World Wide Web, efficient searching of information on the web has become a challenge to the Internet technology community. However in earlier days when the net was not such a huge pool of information as it is today, most of the information used to be stored in static HTML pages. Thus searching techniques were primarily centered around web crawling, indexing and ranking o...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملIgnoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target PageIgnoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target Page
The web is expanding day-by-day and people generally rely on search engines to explore the web. The web has created many challenges for information retrieval. Degree of quality of the information extracted is one of the major issue to be taken care of, and current information retrieval approaches need to be modified to meet such challenges. While doing query based searching, the search engines ...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کامل